7 research outputs found

    Towards Automatic and Adaptive Optimizations of MPI Collective Operations

    Get PDF
    Message passing is one of the most commonly used paradigms of parallel programming. Message Passing Interface, MPI, is a standard used in scientific and high-performance computing. Collective operations are a subset of MPI standard that deals with processes synchronization, data exchange and computation among a group of processes. The collective operations are commonly used and can be application performance bottleneck. The performance of collective operations depends on many factors, some of which are the input parameters (e.g., communicator and message size); system characteristics (e.g., interconnect type); the application computation and communication pattern; and internal algorithm parameters (e.g., internal segment size). We refer to an algorithm and its internal parameters as a method. The goal of this dissertation is a performance improvement of MPI collective operations and applications that use them. In our framework, during a collective call, a system-specific decision function is invoked to select the most appropriate method for the particular collective instance. This dissertation focuses on automatic techniques for system-specific decision function generation. Our approach takes the following steps: first, we collect method performance information on the system of interest; second, we analyze this information using parallel communication models, graphical encoding methods, and decision trees; third, based on the previous step, we automatically generate the system-specific decision function to be used at run-time. In situation when a detailed performance measurement is not feasible, method performance models can be used to supplement the measured method performance information. We build and evaluate parallel communication models of 35 different collective algorithms. These models are built on top of the three commonly used point-to-point communication models, Hockney, LogGP, and PLogP.We use the method performance information on a system to build quadtrees and C4.5 decision trees of variable sizes and accuracies. The collective method selection functions are then generated automatically from these trees. Our experiments show that quadtrees of three or four levels are often enough to approximate experimentally optimal decision with a small mean performance penalty (less than 10%). The C4.5 decision trees are even more accurate (with mean performance penalty of less than 5%). The size and accuracy of C4.5 decision trees can be further improved with use of appropriate composite attributes (such as “total message size”, or “even communicator size”.) Finally, we apply these techniques to tune the collective operations on the Grig cluster at the University of Tennessee and to improve an application performance on the Cray XT4 system at Oak Ridge National Laboratory. The tuned collective is able to achieve more than 40% mean performance improvement over the native broadcast implementation. Using the platform-specific reduce on Cray XT4 lead to 10% improvement in the overall application performance. Our results show that the methods we explored are both applicable and effective for the system-specific optimizations of collective operations and are a right step toward automatically tunable, adaptive, MPI collectives

    Scalable fault tolerant MPI: extending the recovery algorithm," Euro PVM/MPI

    No full text
    Abstract. Fault Tolerant MPI (FT-MPI)[6] was designed as a solution to allow applications different methods to handle process failures beyond simple check-point restart schemes. The initial implementation of FT-MPI included a robust heavy weight system state recovery algorithm that was designed to manage the membership of MPI communicators during multiple failures. The algorithm and its implementation although robust, was very conservative and this effected its scalability on both very large clusters as well as on distributed systems. This paper details the FT-MPI recovery algorithm and our initial experiments with new recovery algorithms that are aimed at being both scalable and latency tolerant. Our conclusions shows that the use of both topology aware collective communication and distributed consensus algorithms together produce the best results.

    Flexible collective communication tuning architecture applied to open MPI

    No full text
    Abstract. Collective communications are invaluable to modern high performance applications, although most users of these communication patterns do not always want to know their inner most working. The implementation of the collectives are often left to the middle-ware developer such as those providing an MPI library. As many of these libraries are designed to be both generic and portable the MPI developers commonly offer internal tuning options suitable only for knowledgeable users that allow some level of customization. The work presented in this paper aims not only to provide a very efficient set of collective operations for use with the Open MPI implementation but also to make the control and tuning of them straightforward and flexible. Additionally this paper demonstrates a novel example of the proposed frameworks flexibility, by dynamically tuning a MPI Alltoallv algorithm during runtime.

    SEMANTICS, DESIGN AND APPLICATIONS FOR HIGH PERFORMANCE COMPUTING

    No full text
    With increasing numbers of processors on current machines, the probability for node or link failures is also increasing. Therefore, application-level fault tolerance is becoming more of an important issue for both end-users and the institutions running the machines. In this paper we present the semantics of a fault-tolerant version of the message passing interface (MPI), the de-facto standard for communication in scientific applications, which gives applications the possibility to recover from a node or link error and continue execution in a well-defined way. We present the architecture of fault-tolerant MPI, an implementation of MPI using the semantics presented above as well as benchmark results with various applications. An example of a fault-tolerant parallel equation solver, performance results as well as the time for recovering from a process failure are furthermore detailed

    Self adapting numerical software (SANS) effort

    No full text
    The challenge for the development of next generation software is the successful management of the complex computational environment while delivering to the scientist the full power of flexible compositions of the available algorithmic alternatives. Self-Adapting Numerical Software (SANS) systems are intended to meet this significant challenge. The process of arriving at an efficient numerical solution of problems in computational science involves numerous decisions by a numerical expert. Attempts to automate such decisions distinguish three levels: • Algorithmic decision; • Management of the parallel environment; • Processor-specific tuning of kernels. Additionally, at any of these levels we can decide to rearrange the user’s data. In this paper we look at a number of efforts at the University of Tennessee that are investigating these areas.

    A Multiscale Model for Avascular Tumor Growth

    Get PDF
    The desire to understand tumor complexity has given rise to mathematical models to describe the tumor microenvironment. We present a new mathematical model for avascular tumor growth and development that spans three distinct scales. At the cellular level, a lattice Monte Carlo model describes cellular dynamics (proliferation, adhesion, and viability). At the subcellular level, a Boolean network regulates the expression of proteins that control the cell cycle. At the extracellular level, reaction-diffusion equations describe the chemical dynamics (nutrient, waste, growth promoter, and inhibitor concentrations). Data from experiments with multicellular spheroids were used to determine the parameters of the simulations. Starting with a single tumor cell, this model produces an avascular tumor that quantitatively mimics experimental measurements in multicellular spheroids. Based on the simulations, we predict: 1), the microenvironmental conditions required for tumor cell survival; and 2), growth promoters and inhibitors have diffusion coefficients in the range between 10(−6) and 10(−7) cm(2)/h, corresponding to molecules of size 80–90 kDa. Using the same parameters, the model also accurately predicts spheroid growth curves under different external nutrient supply conditions
    corecore